Tuesday, June 27, 2006

Optimal histogram bin width

Kevin Knuth wrote a paper about finding the optimal number of bins to represent data in a histogram (Optimal Data-Based Binning for Histograms). He starts from a piecewise constant density model and finds the (Bayesian) posterior probability from this model (equation 36, which is actually the log of the posterior). The posterior function is then maximized to find the number of bins that best models the data.


The article also investigates the number of data points for a reliable estimation of the density. The recommendation is 100-150 points, if the distribution is Gaussian.


It would be interesting to apply this method to radial distribution functions. However the assumption of a constant volume for each bin is not met in this case. There are several ways this could be adjusted, but I'm not sure they are valid (scale each bin count by the volume, or use non-uniform bin spacing to maintain constant volume)


Alternately, the discussion references other algorithms for dealing with variable bin-width models (which may be better for resolving multiple peaks anyway).

Thursday, June 08, 2006

QMC derivation notes

I posted a document I wrote in grad school, Notes on the wavefunction and local energy. It contains derivations of various QMC formulas, particularly the first and second derivatives for several forms of wavefunctions.


I'm posting this for two reasons. The first is in case anyone finds the formulas useful when working on a QMC code.


The second is related to the process of scientific programming. When writing a QMC code, I found it useful to record the formulas and derivations in a neatly typset form. Then the next step involved turning the equations into computer code. (Then, of course, testing and debugging).


This workflow is what I would like to capture with the Progamming in Mathematical Notation work. The document with derivations could be written in content MathML (or something more amenable to human manipulation). Ideally the computer could then assist with verifying the derivations for correctness, and with converting the equations into computer code.


And as long as I'm dreaming, I'd really like a wiki-like interface for creating and editing such a document (making a set of hyperlinked pages rather than a single linear document)